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ABSTRACT 

Aggregated data in real world recommender applications of- 
ten feature fat-tailed distributions of the number of times 
individual items have been rated or favored. We propose a 
model to simulate such data. The model is mainly based on 
social interactions and opinion formation taking place on a 
complex network with a given topology. A threshold mech- 
anism is used to govern the decision making process that 
determines whether a user is or is not interested in an item. 
We demonstrate the validity of the model by fitting atten- 
dance distributions from different real data sets. The model 
is mathematically analyzed by investigating its master equa- 
tion. Our approach provides an attempt to understand rec- 
ommender system's data as a social process. The model can 
serve as a starting point to generate artificial data sets useful 
for testing and evaluating recommender systems. 
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I. INTRODUCTION 

This is the information age. We are witnessing informa- 
tion production and consumption in a speed never seen be- 
fore. The WEB2.0 paradigm enables consumers and produc- 
ers to exchange data in a collaborative way benefiting both 
parties. However, one of the key challenges in our digitally- 
driven society is information overload [7] . We have the 'pain 
of choice'. Recommendation systems represent a possible so- 
lution to this problem. They have emerged as a research area 



on its own in the 90s [42] [20J |28j [21] [TT] . The interest in rec- 
ommendation systems increased steadily in recent years, and 
attracted researchers from different fields [43]. The success 
of highly rated Internet sites as Amazon, Netflix, YouTube, 
Yahoo, Last.fm and others is to a large extent based on 
their recommender engines. Corresponding applications rec- 
ommend everything from CD/DVD's, movies, jokes, books, 
web sites to more complex items such as financial services. 

The most popular techniques related to recommendation 
systems are collaborative filteri ng [8l [26] |TT] [24] [28j |2l] [4l] 



[45] and content-based filtering [14 40 35 5j~ 30 . In ad 



dition, researchers developed alternative methods inspired 
by fields as diverse as machine learning, graph theory, and 
physics [^[^[^[52][5l][l0][48]|50]. Furthermore, rec- 
ommendation systems have been investigated in connection 
with trust [2] |39| |47[ |32[ |33| and personalized web search [9] 
1 12[ [46] , which constitutes the new research frontier in search 
engines. 

However, there are still many open challenges in the re- 
search field of recommendation systems [T[ |22[|25||18[|24[|43[ 
[T5] . One key question is connected to the understanding of 
the user rating mechanism. We build on a well documented 
influence of social interactions with peers on the decision to 
vote, favor, or even purchase an item [441 [27]. We propose 
a model inspired by opinion formation taking place on a 
complex network with a predefined topology. Our model is 
able to generate data observed in real world recommender 
systems. Despite its simplicity, the model is flexible enough 
to generate a wide range of different patterns. We mathe- 
matically analyze the model using a mean field approach to 
the full Master Equation. Our approach provides an under- 
standing of the data in recommender systems as a product 
of social processes. The model can serve as a data genera- 
tor which is valuable for testing and evaluation purposes for 
recommender systems. 

The rest of the paper is organized as follows. The model 
is outlined in Sec. (J2J . Methods, data set descriptions, and 
validation procedures are in Sec. Results are presented 
in Sec. Q. Discussion and an outlook for future research 
directions are in Sec. |5f. 



2. MODEL 
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2.1 Motivation 

Our daily decisions are heavily influenced by various infor- 
mation channels: advertisement, broadcastings, social inter- 
actions, and many others. Social ties (word-of-mouth) play 
a pivotal role in consumers buying decisions [44 [ |27| . It was 



demonstrated by many researchers that personal communi- 
cation and informal information exchange not only influence 
purchase decisions and opinions, but shape our expectations 
of a product or service [49| [4| [3] . On the other hand, it was 
shown 23 , that social benefits are a major motivation to 



participate on opinion platforms. If somebody is influenced 
by recommendations on an opinion platform like MovieLens 
or Amazon, social interactions and word-of-mouth in general 
are additional forces governing the decision making process 
to purchase or even to rate an object in a particular way 

Our model is formulated within an opinion formation frame- 
work where social ties play a major role. We shall discuss 
the following main ingredients of our model: 

• Influence-Network (IN) 

• Intrinsic-Item-Anticipation (IIA) 

• Influence-Dynamics (ID) 

Influence Network. 

We call the network where context-relevant information 
exchange takes place an Influence-Network (IN). Nodes of 
the IN are people and connections between nodes indicate 
the influence among them. Note that we put no constraints 
on the nature of how these connections are realized. They 
may be purely virtual (over the Internet) or based on physi- 
cal meetings. We emphasize that INs are domain dependent, 
i.e., for a given community of users, the Influence Network 
concerning books may differ greatly (in topology, number of 
ties, tie strength, etc.) from that concerning another subject 
such as food or movies. Indeed, one person's opinion lead- 
ers (relevant peers) concerning books may be very different 
from those for food or other subjects. In this scope, we see 
the INs as domain-restricted views on social networks. It 
is thus reasonable to assume that Influence Networks are 
similar to social interaction networks which often exhibit a 
scale-free topology [6] . However, our model is not restricted 
to a particular network structure. 

Intrinsic-Item-Anticipation. 

Suppose a new product is launched on the market. Ad- 
vertisement, marketing campaigns, and other efforts to at- 
tract customers predate the launching process and continue 
after the product started to spread on the market. These 
efforts influence product-dependent customer anticipation. 
It is clear that the resulting anticipation is a complex com- 
bination of many different components including intrinsic 
product quality and possibly also suggestions from recom- 
mendation systems. 

In our model we call the above-described anticipation Intrinsic- 
Item- Anticipation (IIA) and measure it by a single number. 
It is based on many external sources, except for the influence 
generated by social interactions. It is the opinion on some- 
thing taken by individuals, before they start to discuss the 
subject with their peers. Furthermore, we assume that an 
individual will invest resources (time/money) into an object 
only, if the Intrinsic-Item- Anticipation is above a particular 
threshold, which we call Critical- Anticipation-Threshold. 

Influence-Dynamics. 

The Influence-Dynamics describes how individuals' Intrinsic- 
Item-Anticipations are altered by information exchange via 



the connections of the corresponding Influence- Network. From 
our model's point of view this means the following: an in- 
dividual's IIA for a particular item i may be shifted due 
to social interactions with directly connected peers (these 
interactions thus take place on the corresponding IN), who 
already experienced the product or service in question. This 
process can shift the Intrinsic-Item- Anticipation of an indi- 
vidual who did not yet experience product/object i closer to 
or beyond the critical-anticipation-threshold. 

We now summarize the basic ingredients of our model. 
An individual user's opinions on objects are assembled in 
two consecutive stages: i) opinion making based on different 
external sources, including suggestions by recommendation 
systems and ii) opinion making based on social interactions 
in the Influence-Network. The second process may shift the 
opinions generated by the first process. 

2.2 Mathematical formulation of the model 

In this section we firstly describe how individuals' Intrinsic- 
Item-Anticipations may change due to social interactions 
taking place on a particular Influence- Network. Secondly, we 
introduce dynamical processes governing the opinion prop- 
agation. 

IIA shift. 

We model a possible shift in the IIA as: 



fij — fij + 



(1-7) 



(1) 



where /,;.,■ is the shifted Intrinsic-Item- Anticipation of indi- 
vidual j for object i, fij is the unbiased IIA, Oj is the num- 
ber of j's neighbors, who already experienced and liked item 
i, kj denotes the total number of j's neighbors in the cor- 
responding IN, and 7 6 (0, 1) quantifies trust of individuals 
to their peers. An individual j will consume, purchase, or 
positively rate an item i only if 



(2) 



We identify A as the Critical- Anticipation- Threshold. Val- 
ues of fij are drawn from a probability distribution fi . Since 
the IIA for each individual is an aggregate of many different 
and largely independent contributions, we assume that ft 
is normally distributed, fi £ M{(Xi,a). (Unless stated oth- 
erwise.) To mimic different item anticipations for different 
objects i, we draw the mean fii from a uniform distribution 
U(— e,e). We maintain /ii, e, and a, so that fi is roughly 
bounded by (—1, 1), i.e., —1 < fi — 3a < fj, + 3a < 1. Note 
that fij can exceed these boundaries after a shift of the corre- 
sponding IIA occurs. The second term on the right hand side 
of Eq. |T]) is the influence of j's neighborhood weighted by 
trust 7. To better understand the interplay between 7 and 
the density of attending users in the neighborhood of user 
i, p := Qj/kj, we refer to Fig. [I] Trust 7 ~ 1 causes a big 
shift on the IIA's even for p ~ 0. On the other hand, 7 ~ 
needs high p to yield a significant IAA shift. These prop- 
erties are understood as follows: people trusting strongly in 
their peers need only few positive opinions to be convinced, 
whereas people trusting less in their social environment need 
considerable more signals to be influenced. 

Influence-Dynamics. 

The Influence-Dynamics proceeds as follows. Firstly, we 
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Figure 1: Contour plot for 7 and p — @j/kj. Num- 
bers inside the plot quantify the shift in the IAA as 
a function of 7 and p. 



draw an Influence-Network IN('P) with a fixed network topol- 
ogy (power-law, Erdos-Renyi, or another). V refers to a 
set of appropriate parameters for the Influence-Network in 
question (like network type, number of nodes, etc.). The 
network's topology is not affected by the dynamical pro- 
cesses (opinion propagation) taking place on it. We justify 
this static scenario by assuming that the time scale of the 
topology change is much longer then the time scaleQof opin- 
ion spreading in the network. Each node in the Influence- 
Network corresponds to an individual. For each individual 
j we draw an unbiased Intrinsic-Item-Anticipation fij from 
the predefined probability distribution fi. At each time step, 
every individual is in one of the following states: {5*, A, D}. 
S refers to a susceptible state and corresponds to the initial 
state for all nodes at t = 0. A refers to an attender state 
and corresponds to an individual with the property fij > A. 
D refers to a denier state with the property fij < A after 
an information exchange with his/her peers in the Influence- 
Network happened. An individual in state D or A can not 
change his/her state anymore. It is clear that an individual 
in state A cannot back transform to the susceptible state 
S, since he/she did consume or favor item i and we do not 
account for multiple attendances in our model. An indi- 
vidual in state D was influenced but not convinced by his 
opinion leaders (directed connected peers). We make the 
following assumption here: if individual j's opinion leaders 
are not able to convince individual j, meaning that individ- 
ual's j Intrinsic Item Anticipation fij stays below the criti- 
cal threshold A after the influence process, then we assume 
that j's opinion not to attend object i remains unchanged 
in the future. Therefore we have the following possible tran- 
sitions for each node in the influence network: js — > ]a or 
js — > 3d- Node states are updated asynchronously which 
is more realistic than synchronous updating, especially in 
social interaction models [13|. The Influence- Dynamics is 
summarized in Algorithm 111 

Master Equation. 

We are now in the position to formulate the Master Equa- 



The term time scale denotes a dimensionless quantity and 
specifies the devisions of time. A shorter time scale means 
a faster spreading of opinions in the network. 



Algorithm 1 RecSysMod algorithm. V contains the con- 
figuration parameter for the network. A is the Anticipation 
Threshold and 7 denotes the trust. O £ N is the number of 
objects to simulate. G(N, E) is the network. N is the set of 
nodes and E is the set of edges. 
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procedure RecSysModJ^P, A, 7, O) 
G(N, E) <- GenNetwork(P) 
for all Objects in O do 

generate distribution fi from N{pi,(j) 
for each node j £ N in G do 
draw fij from fi 
if fij < A then 



Jstate 



s 



else 

jstate A 

end if 
end for 
repeat 

for all j with j statc 



S AND Qj > do 



fa 



hi 4 
< A 



if hi 

J state 

else 

J state 

end if 
end for 
until \{j\j, 
end for 
end procedure 



■ [ kj ] 
then 
- D 



(1-7) 



A 



= S AND Qj > 0}| = 



tion for the dynamics. As already said before, two things can 
happen when a non-attender is connected to an attender: 
a) he/she becomes an attender too, or b) he/she becomes 
a denier who will not attend/favor the item. For these two 
interaction types we formally write: 



S + A 
S + A 



2A 

D + A 



(3) 



Here A denotes the probability that a susceptible node con- 
nected to an attender becomes an attender too, and a is 
the probability that a susceptible node attached to an at- 
tender becomes a denier. To take into account the underly- 
ing network topology of the Influence Network it is com- 
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Let NC 



be the 

s 



mon to introduce compartments k 

number of nodes in state A with k connections, N k the 
number of nodes in state S with k connections, and N k 
the number of nodes in state D with k connections, respec- 
tively. Furthermore we define the corresponding densities: 
o*(t) = iVfc /N k , s k (t) = N*/N k and d k (t) = N? /N k . N k is 
the total number of nodes with k connections in the network. 
Since every node from N k must be in one of the three states, 
Vt : a k (t) + s k (t) + d k (t) = 1. A weighted sum over all k 
compartments gives the total fraction of attenders at time t, 
a(t) = ~}2 k P(k)a k (t) where P(k) is the degree distribution 
of the network (it also holds that a(t) = N A (t)/N). The 
time dependence of our state variables a k (t), d k (t), s k (t) is 



a k (t) = \ks k {t)£l 
d k (t) = aks k (t)0. 
s k (t) = -(q + \)ks k {t)n j 



(4) 



where Q is the density of attenders in the neighborhood of 
susceptible node with k connections averaged over k 

Q = ^2P(k)(k-l)a k /{k) (5) 

k 

where (k) denotes the mean degree of the network. As 
outlined above, A is the probability that a node in state 
S transforms to state A if it is connected to a node in 
state A. This happens when /y > A. Therefore, we have 
A_ < fij < A where A_ = A - (l/fe) 1 ^ 7 . From this we 
have A = f(x)dx, where f(x) is the expectation dis- 
tribution. Similarly we write for a = J ; A ~ f(x)dx, where 
I denotes the lower bound of the expectation distribution 
f(x). A crude mean field approximation can be obtained by 
multiplying the right hand sides of Eq. Q with P{k) and 
summing over k, which yields a set of differential equations 

a(t) = \(k) s(t)a(t), 1 

d(t) = a(k)s(t)a(t), I (6) 

s(t) = -(a + A) (k) s(t)a(t).) 

which is later used to obtain analytical results for the atten- 
dance fraction a(t). 

3. METHODS 

We describe here our simulation procedures, datasets, ex- 
periments, and analytical methods. 

Simulations. 

Our simulations employ Alg. ffl. As outlined in the model 
section, we do not change the network topology during the 
dynamical processes. We experiment with two different net- 
work types, Erdos-Renyi (ER), and power law (PL) which 
are both generated by a so-called configuration model |34| . 
ER and PL represent two fundamentally different classes 
of networks. The former is characterized by a typical de- 
gree scale (mean degree of the network), whereas the latter 
exhibits a fat-tailed degree distribution which is scale free. 
The networks are random and have no degree correlations 
and no particular community structure. To obtain repre- 
sentative results we stick to the following approach: we fix 
the network type, number of nodes, number of objects, and 
network type relevant parameters to draw an ER or PL net- 
work. We call this a configuration V. In addition, we fix the 
variance a of the anticipation distributions We perform 
each simulation on 50 different networks belonging to the 
same configuration V and on each network we simulate the 
dynamics 50 times. Then we average the obtained atten- 
dance distributions over all 2500 simulations. 

Datasets. 

To show the validity of our model we use real world rec- 
ommender datasets. MovieLens (movielens.umn.edu), a 
web service from GroupLens (grouplens.org) where ratings 
are recorded on a five stars scale. The data set contains 
1682 movies and 943 users. Only 6, 5% of possible votes are 
expressed. Netfiix data set (netflix.com). We use the Net- 
nix grand prize data set which contains 480189 users and 
17770 movies and also uses a five stars scale. Lastfm data 
set (Lastfm.com). This data set contains social networking, 
tagging, and music artist listening information from users 
of the Last.fm online music system. There are 1892 users, 



17632 artists, and 92834 user-listended artists relations in 
total. In addition, the data set contains 12717 bi-directional 
user friendship relations. These data sets are chosen because 
they exhibit very different attendance distributions and thus 
provide an excellent playground to validate our model in dif- 
ferent settings. 

Experiments. 

Data topologies. We firstly investigate the simulated 
attendance distributions as a function of trust 7, the an- 
ticipation threshold A, and the network topology. For this 
purpose we simulate the dynamics on a toy network with 500 
nodes and record the final attendance number of 300 objects. 
The simulation is conducted for ER and PL networks and 
performed as outlined in the simulations paragraph above. 
In Fig.|2| and Fig.([3| we investigate the skewness [53] of the 
attendance distributions and the maximal attendance ob- 
tained for the corresponding parameter settings. The skew- 
ness of a distribution is a measure for the asymmetry around 
its mean value. A positive skewness value means that there 
is more weight to the left from the mean, whereas a negative 
value indicates more weight in the right from the mean. 

Fitting real data. We explore the model's ability to fit 
real world recommendation attendance distributions found 
in the described data sets. For this purpose we fix for the 
Netfiix data set a network with 480189 nodes and perform a 
simulation for 17770 objects. In the MovieLens case we do 
the same for 943 nodes and 1682 objects and for the Lastfm 
data set we simulate on a network with 1892 nodes and 17632 
objects. In the case of Lastfm we have the social network 
data as well. We validate our model on that data set by two 
experiments: a) we use the provided user friendship network 
as simulation input and fit the attendance distribution and 
b) we fit the attendance distribution like in the MovieLens 
and Netfiix case with an artificially generated network. 

Mathematical analysis. We investigate the Master Equa- 
tions Eq. |4| and Eq. dfjb. We provide a full analytical solu- 
tion for Eq. |6| and an analytical approximation for Eq. Q 
in the early spreading stage. 

4. RESULTS 

Data topologies. The landscape of attendance distribu- 
tions of our model is demonstrated in Fig. (J2J and Fig. |3j|. 
To obtain these results, simulations were performed as de- 
scribed in Sec. j3j. The item anticipation /; was drawn from 
a normal distribution with mean values m £ fy ( — 0.1, 0.1) 
and variance a — 0.25 fixed for all items. Both networks 
have 500 nodes. In the Erdos-Renyi case, we used a wiring 
probability p — 0.03 between nodes. The Power Law net- 
work was drawn with an exponent S = 2.25. The simulated 
attendance distributions in Fig.Q and Fig.|3| show a wide 
range of different patterns for both ER and PL Influence- 
Networks. In particular, both network types can serve as 
a basis for attendance distributions with both positive and 
negative skewness. Therefore, the observed fat-tailed distri- 
butions are not a result of the heterogeneity of a scale free 
network but they are emergent properties of the dynamics 
produced by our model. The parameter region for highly 
positively-skewed distributions is the same for both network 
types. The parameters 7 and A can be tuned so that all 
items are attended by everybody or all items are attended 
by nobody. While not relevant for simulating realistic atten- 
dance distributions, these extreme cases help to understand 



the model's flexibility. 
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Figure 2: Skewness of the attendance distributions 
as a function of trust 7 and the critical anticipation 
threshold A for Erdos- Reny networks with 500 nodes 
and 300 simulated items. 
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Figure 3: Skewness of the attendance distributions 
as a function of trust 7 and the critical anticipation 
threshold A for power-law networks with 500 nodes 
and 300 simulated items. 

Fitting real data We fit real world recommender data 
from MovieLens, Netflix and Lastfm with results reported 
in Fig. (|4]), Fig. (|5f, Fig. Q, Fig. 0, and Tab. Q, re- 
spectively. The real and simulated distributions are com- 
pared using Kullback-Leibler (KL) divergence [29]. We re- 
port the mean, median, maximum, and minimum of the 
simulated and real attendance distributions. Trust 7, antic- 
ipation threshold A, and anticipation distribution variance 
a are reported in figure captions. We also compare the aver- 
aged mean degree, maximum degree, minimum degree, and 
clustering coefficient of the real Lastfm social network and 
networks obtained to fit the data. Results are reported in 
Tab. (J2J) and Fig. j8j. Note that thus obtained parameter 
values can be useful also in real applications where, assum- 
ing that our social opinion formation model is valid, one 
could detect decline of the overall trust value in an online 
community, for example. 

Mathematical analysis. Eq. |6| can be solved analyti- 
cally. We have Vt : a(t) + s(t) + d(t) = 1 with the initial con- 




Figure 4: Fit of the MovieLens attendance dis- 
tribution with trust 7 = 0.50, critical anticipation 
threshold A = 0.6, anticipation distribution variance 
a = 0.25, and power law network with exponent 
S = 2.25, 943 nodes, and 1682 simulated objects. 
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Figure 5: Fit of the Netflix attendance distribution 
with trust 7 = 0.52, critical anticipation threshold 
A = 0.72, anticipation distribution variance a — 0.27, 
and power law network with exponent S = 2.2, 480189 
nodes, and 17770 simulated objects. 



ditions for the first movers an, = f(x)dx, s(0) = 1 — a(0), 
and d(0) = 0. In the following we use the bra-ket nota- 
tion (x) to represent the average of a quantity x. Standard 
methods can now be used to arrive atj^J 



«(*) = 



(r (fc))- 1 exp(t/r) 
(a + A) [exp(t/r) - 1] + (r (k) a )" 



(7) 



Here r is the time scale of the propagation which is defined 
as 

T = (a a(k)+X(k))~ 1 . (8) 

This is similar to the time scale r = (A(fc})~ in the well 
known SI Model [38] [6]. Eq.Q can be very useful in predict- 
ing the average behavior of users in a recommender system. 

Since Eq. Q is not accessible to a full analytical solution, 
we investigate it for the early stage of the dynamics. As- 



2 We give here only the solution for a(t) because we are 
mainly interested in the attendance dynamics. 
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Figure 6: Fit of the Lastfm attendance distribution 
with trust 7 = 0.4, critical anticipation threshold A = 
0.8, anticipation distribution variance a = 0.24, and 
real Lastfm user friendship network with 1892 nodes 
and 17632 simulated objects. 



ML 0.046 

NF 0.030 

LFM1 0.05 

LFM2 0.028 



Med 
27/26 
561/561 
1/1 
1/1 



Mean 

59/60 

5654/5837 
5.3/5.2 
5.3/5.8 



Max 



Min 



583/485 1/1 

232944/193424 3/16 

611/503 1/1 

611/547 1/1 



Table 1: Simulation results. ML: Movielens, NF: 
Netfiix, LFM1: Lastfm with real network, LFM2: 
Lastfm with simulated network, KL: Kullback- 
Leibler divergence, Med: Median, Mean, Max: 
maximal attendance (data/simulated), Min: mini- 
mal attendance (data/simulated). 
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k'ri 



C 



LFM1 
LFM2 



13.4 
12.0 



119 
118 



2.3 
2.25 



0.186 
0.06 



Table 2: Mean, minimum, maximum degree, clus- 
tering coefficient C, and estimated exponent 5 of the 
real (LFM1) and simulated (LFM2) social network 
for the Lastfm data set. 
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Figure 7: Fit of the Lastfm attendance distribution 
with trust 7 = 0.6, critical anticipation threshold 
A = 0.8, anticipation distribution variance a — 0.24, 
and power law network with exponent S = 2.25, 1892 
nodes and 17632 simulated objects. 



Figure 8: Log-log plot of real (red) and simulated 
(blue) social network degree distribution P(k) for 
the Lastfm data set. Inset: plot of the cumulative 
degree distribution. 



suming a(0) = oo 3> 0, we can neglect the dynamics of d(t) 
to obtain 

Q(t) = I ^ - 1 ) fi(t). 



(9) 



We emphasize that Eq.(10) is valuable in predicting users' 



(k) 

In addition, Eq. Q yields 

a k (t) = \k(l - a k (t))Q(t) 

h(t) = -(a + A)fc(l - a k {t))fl(t) 



Neglecting terms of order a\{t) and summing the solution 
of a,k{t) over P{k), we get a result for the early spreading 
stage 

a(t) = a(0) (l + tX( exp(i/r) - 1)) , (10) 

with the timescale r = (k 2 ) / [A((fc 2 ) — {k})] . The obtained 
time scale r valid in the early stage of the opinion spreading 
is clearly dominated by the network heterogeneity. This re- 
sult is in line with known disease models, e.g., SI, SIR [38| |6|. 



behavior of a recommender system in an early stage. 

5. DISCUSSION 

Social influence and our peers are known to form and in- 
fluence many of our opinions and, ultimately, decisions. We 
propose here a simple model which is based on heteroge- 
neous agent expectations, a social network, and a formalized 
social influence mechanism. We analyze the model by nu- 
merical simulations and by master equation approach which 
is particularly suitable to describe the initial phase of the 
social "contagion". The proposed model is able to generate 
a wide range of different attendance distributions, includ- 
ing those observed in popular real systems (Netflix, Lastfm, 
and Movielens). In addition, we showed that these patterns 
are emergent properties of the dynamics and not imposed 
by topology of the underlying social network. Of particular 
interest is the case of Lastfm where the underlying social 
network is known. Calibrating the observed attendance dis- 



tribution against the model then leads not only to social 
influence parameters but also to the degree distribution of 
the social network which agrees with that of the true social 
network. 

The Kullback-Leibler distances (KL) for the simulated 
and real attendance distributions are below 0.05 in all cases, 
thus demonstrating a good fit. However, the maximum at- 
tendances could not be reproduced exactly by the model. 
One reason may be missing degree correlations in the sim- 
ulated networks in contrast to real networks where positive 
degree correlations (so-called degree assortativity) are com- 
mon. For the Lastfm user friendship network we observe a 
higher clustering coefficient C ~ 0.f8 compared to the clus- 
tering coefficient C « 0.06 in the simulated network. To 
compensate for this, a higher trust parameter 7 is needed to 
fit the real Lastfm attendance distribution with simulated 
networks. 

We are aware that our statistics to validate the model 
is not complete. But we are confident, that our approach 
points to a fruitful research direction to understand recom- 
mender systems' data as a social driven process. 

The proposed model can be a first step towards a data 
generator to simulate bipartite user-object data with real- 
world data properties. This could be used to test and val- 
idate new recommender algorithms and methods. Future 
research directions may expand the proposed model to gen- 
erate ratings within a predefined scale. Moreover, it could 
be very interesting to investigate the model in the scope of 
social imitation [36] . 

6. REFERENCES 

[1] G. Adomavicius and A. Tuzhilin. Toward the next 
generation of recommender systems: A survey of the 
state-of-the-art and possible extensions. IEEE 
transactions on knowledge and data engineering, pages 
734-749, 2005. 

[2] R. Andersen, C. Borgs, J. Chayes, U. Feige, 
A. Flaxman, A. Kalai, V. Mirrokni, and 
M. Tennenholtz. Trust-based recommendation 
systems: an axiomatic approach, fn Proceeding of the 
17th international conference on World Wide Web, 
pages 199-208. ACM, 2008. 

[3] E. Anderson and L. Salisbury. The formation of 
market-level expectations and its covariates. Journal 
of Consumer Research, 30(1):115-124, 2003. 

[4] J. Arndt. Role of product-related conversations in the 
diffusion of a new product. Journal of Marketing 
Research, 4(3):291-295, 1967. 

[5] M. Balabanovic and Y. Shoham. Fab: content-based, 
collaborative recommendation. Communications of the 
ACM, 40(3):66-72, 1997. 

[6] A. Barrat, M. Barthlemy, and A. Vespignani. 
Dynamical Processes on Complex Networks. 
Cambridge University Press New York, NY, USA, 
2008. 

[7] S. Bergamaschi, F. Guerra, and B. Leiba. Information 
overload. Internet Computing, IEEE, 14(6): 10-13, 
2010. 

[8] D. Billsus and M. Pazzani. Learning collaborative 
information filters. In Proceedings of the Fifteenth 
International Conference on Machine Learning, 
volume 54, page 48, 1998. 



[9] A. Birukov, E. Blanzieri, and P. Giorgini. Implicit: An 
agent-based recommendation system for web search. 
In Proceedings of the fourth international joint 
conference on Autonomous agents and multiagent 
systems, pages 618-624. ACM, 2005. 

[10] M. Blattner. B-rank: A top N recommendation 
algorithm. In Proceedings of The International 
Multi-Conference on Complexity, Informatics and 
Cybernetics (IMCIC 2010), volume 1, pages 337-341, 
Orlando, USA, 2010. 

[11] J. Breese, D. Heckerman, and C. Kadie. Empirical 
analysis of predictive algorithms for collaborative 
filtering. In Proceedings of the 14th Annual Conference 
on Uncertainty in Artificial Intelligence (UAI-98), 
pages 43-52, San Francisco, CA, 1998. Morgan 
Kaufmann. 

[12] P. Brusilovsky, A. Kobsa, and W. Nejdl. The adaptive 
web: methods and strategies of web personalization. 
Springer- Verlag New York Inc, 2007. 

[13] G. Caron-Lormier, R. Humphry, D. Bohan, C. Hawes, 
and P. Thorbek. Asynchronous and synchronous 
updating in individual-based models, ecological 
modelling, 212(3-4):522-527, 2008. 

[14] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, 

D. Netes, and M. Sartin. Combining content-based 
and collaborative filters in an online newspaper. In 
Proc. ACM SIGIR 99, Workshop Recommender 
Systems: Algorithms and Evaluation, 1999. 

[15] H. Drachsler, T. Bogers, R. Vuorikari, K. Verbert, 

E. Duval, N. Manouselis, G. Beham, S. Lindstaedt, 

H. Stern, M. Friedrich, et al. Issues and considerations 
regarding sharable data sets for recommender systems 
in technology enhanced learning. Procedia Computer- 
Science, l(2):2849-2858, 2010. 

[16] F. Fouss, A. Pirotte, J. Renders, and M. Saerens. 
Random-walk computation of similarities between 
nodes of a graph with application to collaborative 
recommendation. Knowledge and Data Engineering, 
IEEE Transactions on, 19(3):355-369, 2007. 

[17] F. Fouss, L. Yen, A. Pirotte, and M. Saerens. An 
experimental investigation of graph kernels on a 
collaborative recommendation task. In Data Mining, 
2006. ICDM'06. Sixth International Conference on, 
pages 863-868. IEEE, 2007. 

[18] W. Geyer, J. Freyne, B. Mobasher, S. Anand, and 
C. Dugan. 2nd workshop on recommender systems 
and the social web. In Proceedings of the fourth ACM 
conference on Recommender systems, pages 379-380. 
ACM, 2010. 

[19] J. Gleeson. High-accuracy approximation of 

binary-state dynamics on networks. Physical Review 
Letters, 107(6):68701, 2011. 

[20] D. Goldberg, B. D. Nichols, and D. Terry. Using 
collaborative filtering to weave an information 
tapestry. Commun. ACM, 35(12):61-70, 1992. 

[21] N. Good, J. Schafer, J. Konstan, A. Brochers, 

B. Sarwar, J. Herlocker, and J. Riedl. Combining 
collaborative filtering with personal agents for better 
recommendations. In Proc. Conf. Am. Assoc. Artificial 
Intelligence (AAAI-99), pages 439-446, USA, 1999. 

[22] I. Guy, A. Jaimes, P. Agullo, P. Moore, P. Nandy, 

C. Nastar, and H. Schinzel. Will recommenders kill 



search?: recommender systems-an industry 
perspective. In Proceedings of the fourth ACM 
conference on Recommender systems, pages 7-12. 
ACM, 2010. 

[23] T. Hennig-Thurau, K. Gwinner, G. Walsh, and 
D. Gremler. Electronic word-of-mouth via 
consumer-opinion platforms: What motivates 
consumers to articulate themselves on the Internet? 
Journal of Interactive Marketing, 18(l):38-52, 2004. 

[24] J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. 
Evaluating collaborative filtering recommender 
systems. ACM Trans. Inf. Syst., 22(l):5-53, 2004. 

[25] D. Jannach, W. Geyer, J. Freyne, S. Anand, 

C. Dugan, B. Mobasher, and A. Kobsa. Recommender 
Systems & the Social Web. In Proceedings of the 2009 
ACM Conference on Recommender Systems, RecSys 
2009, New York, NY, USA, October 23-25, 2009. 
ACM, 2009. 

[26] K.Goldberg, T. Roeder, D. Gupta, and C. Perkins. 

Eigentaste: a constant time collaborative filtering 

algorithm. Information Retrieval, (4):133-151, 2001. 
[27] Y. Kim and J. Srivastava. Impact of social influence in 

e-commerce decision making. Proceedings of the ninth 

international conference on Electronic commerce 

(ICEC), pages 293-302, 2007. 
[28] J. Konstan, B. Miller, D. Maltz, J. Herlocker, and 

L. Gordon. Grouplens: Applying collaborative filtering 

to Usenet news. Comm. ACM, 40(3):77-87, 1997. 
[29] S. Kullback and R. Leibler. On information and 

sufficiency. The Annals of Mathematical Statistics, 

22(l):79-86, 1951. 
[30] G. Linden, B. Smith, and J. York. Amazon, com 

recommendations: Item-to-item collaborative filtering. 

IEEE Internet computing, 7(l):76-80, 2003. 
[31] M. Mason, R. Dyer, and M. Norton. Neural 

mechanisms of social influence. Organizational 

Behavior and Human Decision Processes, 

110(2):152-159, 2009. 
[32] P. Massa and P. Avesani. Trust-aware collaborative 

filtering for recommender systems. On the Move to 

Meaningful Internet Systems 2004-' CoopIS, DOA, and 

ODBASE, pages 492-508, 2004. 
[33] P. Massa and B. Bhattacharjee. Using trust in 

recommender systems: an experimental analysis. Trust 

Management, pages 221-235, 2004. 
[34] M.E.J. Newman. Structure and function of complex 

networks. SI AM Review, 45:167-256, 2003. 
[35] P. Melville, R. Mooney, and R. Nagarajan. 

Content-boosted collaborative filtering for improved 

recommendations. In Proc. 18h Nat'l Conf. ARtificial 

Intelligence, 2002. 
[36] Q. Michard and J. Bouchaud. Theory of collective 

opinion shifts: from smooth trends to abrupt swings. 

The European Physical Journal B- Condensed Matter 

and Complex Systems, 47(1):151-159, 2005. 
[37] B. Mirza, B. Keller, and N. Ramakrishnan. Studying 

recommendation algorithms by graph analysis. 

Journal of Intelligent Information Systems, 

20(2):131-160, 2003. 
[38] M. Newman. Spread of epidemic disease on networks. 

Physical Review E, 66(1):016128, 2002. 



[39] J. O'Donovan and B. Smyth. Trust in recommender 
systems. In Proceedings of the 10th international 
conference on Intelligent user interfaces, pages 
167-174. ACM, 2005. 

[40] M. Pazzani and D. Billsus. Content-based 

recommendation systems. Lecture Notes Computer 
Science, 4321:325-341, 2007. 

[41] P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and 
J. Riedl. Grouplens: An open architecture for 
collaborative filtering of netwews. In Proc. Computer 
Supported Cooberative Work Conf., 1994. 

[42] P. Resnick and H. Varian. Recommender systems. 
Commun. ACM, 40:56-58, March 1997. 

[43] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, 
editors. Recommender Systems Handbook. Springer, 
2011. 

[44] M. Richins and T. Root-Shaffer. THE ROLE OF 
EVOLVEMENT AND OPINION LEADERSHIP IN 
CONSUMER WORD-OF-MOUTH: AN IMPLICIT 
MODEL MADE EXPLICIT. Advances in consumer 
research, 15:32-36, 1988. 

[45] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. 
Item-based collaborative filtering recommendation 
algorithms. In WWW '01: Proceedings of the 10th 
international conference on World Wide Web, pages 
285-295, New York, NY, USA, 2001. ACM Press. 

[46] K. Sugiyama, K. Hatano, and M. Yoshikawa. Adaptive 
web search based on user profile constructed without 
any effort from users. In Proceedings of the 13th 
international conference on World Wide Web, pages 
675-684. ACM, 2004. 

[47] F. Walter, S. Battiston, and F. Schweitzer. A model of 
a trust-based recommendation system on a social 
network. Autonomous Agents and Multi-Agent 
Systems, 16(l):57-74, 2008. 

[48] G. Webb, M. Pazzani, and D. Billsus. Machine 
learning for user modeling. User Modeling and 
User- Adapted Interaction, ll(l):19-29, 2001. 

[49] W. Whyte Jr. The web of word of mouth, 1954. 

[50] T. Zhang and V. Iyengar. Recommender systems 
using linear classifiers. The Journal of Machine 
Learning Research, 2:334, 2002. 

[51] Y. Zhang, M. Blattner, and Y. Yu. Heat conduction 
process on community networks as a recommendation 
model. Physical review letters, 99(15):154301, 2007. 

[52] T. Zhou, J. Ren, M. Medo, and Y. Zhang. Bipartite 
network projection and personal recommendation. 
Physical Review E, 76(4):46115, 2007. 

[53] D. Zwillinger and S. Kokoska. CRC standard 

probability and statistics tables and formulae. CRC, 
2000. 



